A Generic Recognition System for Making Archives Documents accessible to Publi
نویسندگان
چکیده
This paper presents annotations needed for handwritten archives document retrieval by content. We propose two complementary ways of producing those annotations : automatically by using optical document recognition and collectively by using Internet and a manual input by users. A platform for managing those annotations is presented as well as examples of automatic annotations on civil status registers, military forms (tested on 60,000 pages) and naturalization decrees, using a generic document recognition method. Examples of collective annotations built on automatic annotations are also
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملAccès par le contenu aux documents manuscrits d'archives numérisés
This paper presents handwritten archives document retrieval by content. This retrieval is build on information (annotations) associated to document images. We propose two complementary ways of producing those annotations : automatically by using optical document recognition and collectively by using internet and a manual input by users. A platform for managing those annotations is presented as ...
متن کاملEnriching Textual Documents with Timecodes from Video Fragments
The OLIVE project aims the development of a multilingual indexing tool for broadcast material based on speech recognition, which automatically produces indexes from the sound track of a program (television or radio). Such a tool allows multimedia archives to be searched by keywords and corresponding fragments to be retrieved. This paper gives a report on the alignment module, which is one of th...
متن کاملMaking Indian Language Legacy Documents Accessible Via Web
The reliable optical character recognition is not available for scripts of Indian languages. Thus, the only way to make legacy documents in Indian languages available on the web is by scanning them. This work is an attempt to cater to the need for a better representation and efficient storage technique for Indian language documents and their near perfect regeneration at the browser. We work wit...
متن کاملThe Making of a New Medical Specialty: A Policy Analysis of the Development of Emergency Medicine in India
Background Medical specialization is an understudied, yet growing aspect of health systems in low- and middleincome countries (LMICs). In India, medical specialization is incrementally, yet significantly, modifying service delivery, workforce distribution, and financing. However, scarce evidence exists in India and other LMICs regar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003